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Understanding how the brain extracts and combines temporal 
structure (rhythm) information from events presented to different 
senses remains unresolved. Many neuroimaging beat perception 
studies have focused on the auditory domain and show the 
presence of a highly regular beat (isochrony) in "auditory" stimulus 
streams enhances neural responses in a distributed brain network 
and affects perceptual performance. Here, we acquired functional 
magnetic resonance imaging (fMRI) measurements of brain activity 
while healthy human participants performed a visual task on 
isochronous versus randomly timed "visual" streams, with or 
without concurrent task-irrelevant sounds. We found that visual 
detection of higher intensity oddball targets was better for 
isochronous than randomly timed streams, extending previous 
auditory findings to vision. The impact of isochrony on visual target 
sensitivity correlated positively with fMRI signal changes not only 
in visual cortex but also in auditory sensory cortex during 
audiovisual presentations. Visual isochrony activated a similar 
timing-related brain network to that previously found primarily in 
auditory beat perception work. Finally, activity in multisensory left 
posterior superior temporal sulcus increased specifically during 
concurrent isochronous audiovisual presentations. These results 
indicate that regular isochronous timing can modulate visual 
processing and this can also involve multisensory audiovisual brain 
mechanisms. 
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Introduction 

Our senses are continuously bombarded by a plethora of events 
in our environment. These events often carry rich timing 
information that can be used to determine the relationship 
between inputs within or between different sensory modalities. 
This is apparent when listening to and observing music being 
played. By watching the lead violinist, we are better able to 
extract the stream of individual notes they play from the 
complex auditory input generated by the full orchestra. By 
listening to a soloist play we are better able to predict the series 
of movements they will make than when watching a muted 
recording. In these 2 examples, synchronous audiovisual pre- 
sentations enhance understanding of component events, in 
particular their underlying temporal structure (rhythm). Un- 
derstanding how the brain extracts and combines both timing 
information from events within and between different sensory 
modalities has been little explored. Here, we focused on how 
audiovisual presentations manipulate brain responses to visual 
stimulus trains with different temporal structures. 
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Knowing when an event will occur can influence percep- 
tion, resulting in both speeded reaction times (e.g., Coull and 
Nobre 1998; Davranche et al. 2011; Griffin et al. 2001) and 
improved accuracy for judging that event (Correa et al. 2004, 
2005; Martens and Johnson 2005; Davranche et al. 2011). Much 
work on this topic has used temporal "orienting" cues that 
indicate the likely onset time for a target stimulus either 
symbolically or via the timing of another event (e.g., Bertelson 
1967; Bertelson and Tisseyre 1968; Niemi and Naiitanen 1981; 
Coull and Nobre 1998; Coull et al. 2000; Griffin et al. 2001; 
Nobre 2001; Correa et al. 2004, 2005; Martens and Johnson 
2005; Davranche et al. 2011). Other studies have examined 
how the rhythm or global temporal structure of a stimulus train 
can provide temporal information regarding onset of a critical 
target event (e.g., Jones et al. 2002, 2006; Coull and Nobre 
2008; Rimmele et al. 2011; Rohenkohl et al. 2011). For instance, 
trains of regularly timed (isochronous) stimuli that predict the 
onset of a final event can enhance perceptual judgment of that 
event in audition (Jones et al. 2002; Rimmele et al. 2011), 
although reportedly not in vision (Doherty et al. 2005). The 
apparent difference between the impact of isochrony on these 
2 modalities might potentially reflect the better temporal 
resolution of audition than vision (Mabbott 1951), as shown for 
instance by timing judgments (e.g., Recanzone 2003; Merchant 
et al. 2008; Grondin and McAuley 2009) or by rhythm 
reproduction or recall (e.g., Glenberg et al. 1989; Glenberg 
and Jona 1991; Repp and Penel 2004; Kato and Konishi 2006; 
Mayer et al. 2009). Here, we manipulated isochrony for the 
timing of visual stimulus trains in a visual task, while also 
manipulating whether (task-irrelevant) synchronous sounds 
were present or not, using both behavioral and functional 
magnetic resonance imaging (fMRI) measures of brain activity. 

Neuroimaging studies investigating processing of timing in 
the brain have consistently reported activations in a cortico- 
striatal network, including the supplementary motor area 
(SMA), dorsolateral prefrontal cortex (DLPFC), inferior frontal 
gyrus (IFG), insula and basal ganglia (e.g., Ferrandez et al. 2003; 
Coull et al. 2004; Livesey et al. 2007; Macar et al. 2006; Meek 
et al. 2008; Kosillo and Smith 2010; Harrington et al. 2011). 
Activity in all these regions is typically enhanced during 
isochronous beat-containing auditory stimuli, compared with 
less structured or more complex timing conditions (e.g., Grahn 
and Brett 2007; Bengtsson et al. 2009; Teki et al. 2011), with 
responses in the IFG and insula relating to beat perception 
strength (Grahn and McAuley 2009). The majority of such beat 
perception studies have focused on active monitoring of 
rhythms presented in audition, whereas here we instead 
examined the possible impact of isochrony for visual stimulus 
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trains. We note that some of the literature using temporal 
orienting cues, rather than isochronous trains, have already 
identified the involvement of left inferior parietal cortex in 
implicit timing tasks for vision (e.g., Assmus et al. 2003, 2005; 
Coull and Nobre 2008; Wiener et al. 2010; Cotti et al. 2011; 
Davranche et al. 2011). 

Some effects of audiovisual timing manipulations have also 
been observed for a multisensory region of posterior superior 
temporal sulcus (pSTS; e.g., Calvert et al. 2000; Calvert 2001; 
Macaluso et al. 2004; Noesselt et al. 2007; Stevenson et al. 2010; 
Marchant et al. 2011). This region is thought to receive 
convergent inputs from auditory and visual cortices (Seltzer 
and Pandya 1989; Seltzer et al. 1996; Lewis and van Essen 2000) 
and is commonly reported as being activated during audiovisual 
integration (e.g., Calvert et al. 2000; Beauchamp et al. 2004; van 
Atteveldt et al. 2007; Stevenson and James 2009). For instance, 
activity in pSTS is typically greatest when audiovisual stimuli 
have simpler temporal structure, as most commonly manipu- 
lated by comparing synchronous to asynchronous stimuli (e.g., 
Calvert 2001; Macaluso et al. 2004; Noesselt et al. 2007; 
Stevenson et al. 2010; Marchant et al. 2011). In addition to 
whole -brain fMRI analyses, here we shall examine an a priori 
region of interest (ROI) in pSTS, to investigate any interaction 
effect between the impact of isochronous/random stream 
timing in vision and the presence/absence of concurrent 
auditory stimuli (audiovisual/vision-only). The coordinates for 
this pSTS ROI were taken from Noesselt et al. (2007), who 
utilized similar streams of simple flashes and beeps to those 
used here, while manipulating audiovisual synchrony in their 
study (see also Marchant et al. 2011, for use of the identical 
pSTS ROI). However, Noesselt et al. (2007) used only 
irregularly timed stimulus streams, whereas here we manipu- 
lated isochronous versus random timing for successive events 
within each stream. 

Some impacts of timing have also been observed for sensory- 
specific cortices. Isochronous auditory stimuli with their highly 
predictable temporal structure can enhance activity in auditory 
cortices (Grahn and Brett 2007; Bengtsson et al. 2009), 
although Teki et al. (2011) reported attenuation. Activity in 
visual cortex can increase at the expected onset of a visual 
event (Bueti et al. 2010), which may reflect orienting of 
attention to the correct time point (Coull and Nobre 1998). 

In the current study, we used trains of simple visual stimuli 
with either isochronous or pseudorandom timing. The behav- 
ioral task was to detect occasional higher intensity target 
events within each visual stream. The difference in intensity for 
such targets was titrated to avoid ceiling or floor effects in 
performance. The isochronous or pseudorandom timing of 
each visual stream gave no information about which item might 
be a higher intensity target, since intensity is fully orthogonal to 
timing. Nevertheless, we predicted that detection of intensity 
targets might be enhanced for the isochronous streams due to 
the predictable timing of events within them. As regards brain 
activity, we sought to test whether the timing network 
implicated in previous studies, involving parietal cortex, 
DLPFC, IFG, SMA, insula, and basal ganglia (Grahn and Brett 
2007; Coull and Nobre 2008; Bengtsson et al. 2009; Kosillo and 
Smith 2010; Wiener et al. 2010; Cotti et al. 2011; Davranche 
et al. 2011; Teki et al. 2011), might be implicated in 
isochronous streams for vision. We further manipulated the 
presence/absence of concurrent (but task-irrelevant auditory) 
events, to test whether this might enhance any impacts of 



isochrony on brain activations (for the above network, plus for 
pSTS) and potentially for any impacts of isochrony on visual 
target detection. Finally, given that timing manipulations can 
also affect sensory-specific cortex (for both visual and auditory 
cortex, see above), we examined regions of visual and auditory 
cortex that responded to our stimuli, testing whether their 
activity related to the impact of the timing manipulation upon 
sensory performance. 

Materials and Methods 

Participants 

Seventeen volunteers (age range 19-35 years, 9 females) with no 
history of neurological or psychiatric illness by self-disclosure gave 
written informed consent to participate and were reimbursed for their 
time. All had normal or corrected vision and normal hearing by self- 
report. Data from one participant were removed due to excessive 
movements during scanning. This study was approved by the University 
College London Research Ethics Committee and conducted in 
accordance with the Code of Ethics of the World Medical Association 
(Declaration of Helsinki). 

Experimental Set- Up 

Visual and auditory stimuli were presented using Cogent vl.25 (Vision 
Lab, University College London, UK; http://www.vislab.ucl.ac.uk/), 
running in MATLAB v6.5 (MathWorks Inc., Natick, MA) on a Windows 
PC. Visual stimuli were back-projected onto a screen (30° x 26°) using 
a LCD projector (LT158; NEC) visible to the participant inside the 
scanner via a mirror mounted on the MR head coil. Auditory stimuli 
were presented via etymotic earphones (E-A-RTONE 3A Insert Ear- 
phone, E-A-R Auditory Systems, Aearo Company, IN), and ear defenders 
were worn to reduce background scanner noise. Participants made 
responses on a 1 -button fiber-optic keypad with their right index finger. 

Stimuli and Experimental Design 

Each trial was 14 s in duration and comprised on average 57 rapid visual 
events (range 36-141), of which up to 6 were higher intensity targets 
(mean 3). The standard visual stimulus was a red central annulus 
(33 ms, 8° va diameter, 2° va aperture, 0.06 cd/mm 2 ), and the target 
stimulus was identical except brighter (by a mean ± standard deviation 
of 0.17 ± 0.86 cd/mm 2 across participants after individual titration). 
Target luminance was set for each participant prior to the main 
experiment to achieve approximately 75% hit-rate. Target events were 
restricted from occurring within 1.5 s from the start of a trial, end of the 
trial, or another target event. Visual stimuli were presented on a black 
background and a white central fixation cross (0.5° va, 2.31 cd/mm 2 ) 
remained visible throughout the experimental session (Fig. la). The 
intertrial interval was 2.01 s. Participants were instructed to make an 
immediate button press with their right index finger on detection of 
a brighter visual target. 

A 2 x 2 factorial design manipulated the timing of visual events 
within each stimulus train (isochronous/random) and whether or not 
a synchronous auditory tone (30 ms including 5 ms onset and offset 
ramp, 1000 Hz, 64 dB(A)) accompanied all visual events on that trial. 
Four possible stimulus-onset-asynchronies (SOAs; 100, 200, 300, and 
400 ms) were used between events. In the isochronous condition 
(ISO), all SOAs were identical throughout a trial, but different SOAs 
were used for different trials so there was an equal number of each SOA 
type presented overall (Fig. 1 b). In the random condition (RAND), each 
of the 4 SOAs were equally likely to occur before each event (Fig. 1 c). 
On half of the trials, the visual stimuli were presented alone (V: vision- 
only) and on the remainder of trials an auditory tone was presented in 
synchrony with each visual stimulus on that trial (VA: vision and 
audition, i.e., audiovisual). The auditory stimuli never provided any 
information about which visual event was a target because the same 
tone accompanied all visual events. 

A total of 32 trials were presented for each of the 4 conditions (V ISO , 
Vrandi VA iso , and VArand) per participant. Each participant performed 
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Figure 1. Schematic of visual task and behavioral performance measures, 
(a) Schematic of visual intensity-target detection task, {b and e) Timeline for 
stimulus onset during the 4 isochronous trial types (1 00, 200, 300, and 400 ms SOAs) 
and an exemplar random timing trial, (d) Visual target detection sensitivity (a") was 
enhanced and (e) RTs reduced by isochronous (light bars) compared with random 
timing conditions (dark bars), when visual stimuli were presented alone (V; blue bars) 
or accompanied by synchronous auditory tones (VA; red bars). Group means (±1 
standard error of the difference (s.e.d) for isochrony effect). 

4 functional imaging sessions (-10 min each), each comprising 32 trials 
(8 trials per experimental condition) plus 4 null trials (14 s) presented 
in a pseudorandomised order. Prior to the experimental sessions, an in- 
situ practice was performed inside the scanner to familiarize 
participants with the task, set visual target luminance, and ensure that 
experimental auditory stimuli were clearly audible while the scanner 
was running. A 2-min fieldmap scan and a 12 min structural MRI scan 
were also conducted. 

Behavioral Analysis 

The first button press occurring within 1.5 s (response time window) 
after a target stimulus was classified as a hit (i.e., correctly detected 
target). The response time window matched the minimum period 
allowed between target events. The hit-rate was calculated by dividing 
the total number of hits by the number of targets presented across trials 
per condition. Any other button presses (not falling within the 1.5 s post- 
target response window) were classified as false alarm responses, and the 
total of these were divided by the total number of nontarget events to 
produce a false alarm rate. Signal detection analysis was then used to 
combine the hit-rate and false alarm rate measures into a formal measure 
of target detection sensitivity (rf' = Z(P hits ) - Z(P Msc alarms)- Target 
detection sensitivity (d r ) and mean reaction time for hit responses (RT) 
were calculated for each of the 4 experimental conditions and then 
entered into 2x2 repeated measurement analyses of variance (ANOVAs) 
(stimulus timing x presence of auditory tone). All statistical analyses on 
behavior were performed in SPSS vl6.0 (SPSS Inc., Chicago). 

Scanning Protocol 

A Siemens 3T Allegra MRI (Siemens, Erlangen, Germany) with head coil 
system was used to acquire high-resolution 3"i -weighted anatomical 



images (176 sagittal slices, field of view [FoV] = 256 x 240 mm FoV, 
1 mm 3 voxel size); fieldmap images (double-echo FLASH, time echo 
[TE], = 10 ms, TE 2 = 12.47 ms, 3x3x2 mm resolution and 1-mm 
interslice gap); and 7J -weighted echoplanar functional images for 
blood oxygen level-dependent (BOLD) contrast (40 slices, 2-mm slice 
thickness and 1-mm gap, 3-mm resolution in plane, slice TE = 30 ms, 
volume time repetition = 2.4, 64 x 64 matrix). An EPI sequence with 
a sinusoidal readout and lower slew rates was used to reduce acoustic 
noise, although this was still audible throughout. Four task EPI sessions 
of 253 volumes were collected, and the first 6 volumes were discarded 
to allow for T x equilibrium effects. 



JMRI Preprocessing and First-Level Analysis 

The fMRI data were analyzed using statistical parametric mapping with 
SPM5 software (http://www.fil.ion.ucl.ac.uk/spm; see Friston et al. 
1995). Scans from each participant were realigned using the first as 
a reference, unwarped incorporating fieldmap distortion information, 
spatially normalized into Montreal Neurological Institute (MNI) standard 
space (Evans et al. 1992, 1993), resampled to 3 x 3 x 3 mm 3 voxels and 
then spatially smoothed with a Gaussian kernel of 6 mm full-width at 
half-maximum, in accord with the standard SPM approach. 

The 4 experimental trial types (stimulus timing x presence of auditory 
tone) were modeled as separate regressors with a 14 s boxcar spanning 
the trial duration. A first-order parametric function was used to model 
activity associated with the different number of events within each trial, 
due to the varied SOA. A stick function regressor for button presses 
across all trials modeled variance due to target detection and associated 
motor responses. All regressors were convolved with the haemodynamic 
response function with both temporal and dispersion derivatives. Six 
head movemet regressors created during the realignment preprocessing 
were also included. 

First-level contrast images were generated for the main effects of 
timing and auditory presence, and the interaction between these 2 
experimental factors (testing for a larger effect of auditory presence 
during isochronous than random streams). Contrast images were also 
created for each of the 4 conditions to be used in a second-level 
random-effects ANOVA for a conjunction analysis. Additionally, 
a contrast image for the main effect of task (collapsed across all 4 
conditions) versus null trials allowed identification of peak voxel 
activations in sensory cortices (left/right occipital lobe, left/right 
superior temporal gyrus) responding to our stimuli during the task. 
First-level contrasts were estimated according to the general linear 
model for each participant. 



Brain- Behavior Relation in Sensory Cortices 
The participant-specific peak voxels for task trials versus null trials 
within the left and right occipital lobes and superior temporal gyri were 
taken to represent stimulus-responsive visual and auditory cortex for 
each participant. Beta parameter estimates for effects of isochrony versus 
random timing were extracted for such peak voxels. A participant- 
by-participant robust regression analysis (MATLAB robust-fit function, 
default bi-square option) was then performed, using the change in beta 
parameters values against the change in visual target sensitivity (rf') 
for the main effect of timing (ISO > RAND) for each sensory region. 
A positive relationship was anticipated, to reflect greater enhancement 
in brain activity relating to better performance. Note that using the 
robust-fit function guarded against any such brain-behavior relations 
being driven by unrepresentative outliers. 



Whole Brain Analysis 

First-level contrast images for each participant were entered into 
a second-level random-effects analysis for statistical assessment across 
participants (Friston et al. 1995). Second-level f-tests were performed 
for main effects and interaction contrasts. A second-level repeated 
measures 2x2 ANOVA (stimulus timing x presence of auditory tone) 
was generated, and a conjunction analysis was performed to assess 
regions with common activations for the effect of timing on vision-only 
trials (V ISO > Vrand) and on audiovisual trials (VA ISO > VAr^d). Voxel 
threshold was set at Punc*- 0.001 and only significant clusters surviving 
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correction for multiple comparisons (n > 20; pp^ < 0.05) are reported 
for whole brain analysis. Peak locations for all significant clusters are 
reported in MNI space. 

ROI Analysis 

An a priori ROI analysis was performed on multisensory pSTS, as 
identified by Noesselt et al. (2007; see also Marchant et al. 2011). 
Noesselt et al. showed this region was modulated by the temporal 
properties of audiovisual stimuli contralateral to a peripheral visual 
stimulus. Given the central stimuli presented in the current study, an 
8-mm sphere was centered on their peak voxel location in left ( x = -54, 
y = -50, z = 8) and right (x = 60, y = -48, z = 12) pSTS. The average 
parameter beta values for each ROI were extracted for each participant 
using the MarsBaR toolbox (Brett et al. 2002) and entered into a 2 x 2 
repeated measures ANOVA (stimulus timing x presence of auditory 
tone) in SPSS, with post hoc f-tests performed. 

Results 

Behavioral Results 

Stimulus timing influenced visual intensity-target detection 
sensitivity (<*'; F 1>15 = 51.1, P < 0.001; Fig. Id) and RTs (F U5 = 
9.6, P = 0.007; Fig. le). Visual sensitivity was improved and 
reaction times faster for the isochronous than the random 
timing condition. The presence of an accompanying tone did 
not significantly influence either visual performance measure. 

JMRI Results 

Presence of an Auditory Tone Activates Auditory Cortices 
Unsurprisingly, auditory cortices along bilateral superior 
temporal gyri (STG; including both Heschl's gyri and the 
planum temporale) were more active during audiovisual than 
vision-only trials (left STG: cluster Pfwe < 0.001, 823 voxels, 
peak f, 5 = 5.55, x= -54, y= -33, z= 12; right STG; cluster ppwE < 
0.001, 750 voxels, peak t 15 = 8.60, x = 63, y = -24, z = 12). No 
brain regions were more active during the vision-only trials. 

Isochrony Activated Network of Timing Regions 
Isochrony enhanced activity in bilateral IFG, insula, putamen 
and globus pallidus, left DLPFC, and left intraparietal sulcus 
(IPS) when compared with random timing (Table 1; Fig. 2a- c). 
A conjunction analysis between the simple effect of iso- 
chronous versus random timing on BOLD signal during the 
vision-only and audiovisual conditions ([Vi SO > Vrand] and 
[VAjso > VAranq]) confirmed common activation of the right 
anterior insula by isochrony regardless of sound presence/ 
absence (cluster Pfwe = 0.029, 72 voxels, peak ? 15 = 4.52, x = 30, 
y = 24, z = 6; Fig. 2d,e). A similar pattern of activity was 
observed in the left anterior insula (Fig. 2d,e) but that cluster 
did not reach full statistical significance and is reported only 
for completeness (cluster Pfwe > 0.05, 8 voxels, peak t l5 = 4.08, 
x = -30, y= 21, z= 6). No regions were preferentially activated 
for random versus isochronous stimuli. 

Positive Brain-Behavior Relation for Main Effect of Timing 
in Sensory Cortices 

Task-related (experimental trials > null) peak voxels in bilateral 
visual (occipital lobe) and auditory (STG) cortices were identified 
for each participant (Table 2; Fig. 3) and beta parameter estimates 
extracted. There was a positive relation between change in 
behavioral performance (visual target d') and change in activity 
in the right occipital lobe (Fig. 3d) and bilateral STG (Fig. 3a,b) 



Table 1 

Brain regions more active during isochronous than pseudorandomly timed stimulus trains 





Cluster 




Peak voxel 






Pfwe 


voxels 


fib 


X 


y 


z 


L DLPFC 


0.017 


31 


5.09 


-36 


36 


18 


L IFG 






4.72 


-45 


39 


12 


L insula (posteriori 


0.024 


29 


4.75 


-39 


3 


-3 


L insula (anterior) 


<0.001 


59 


5.69 


-30 


21 


3 


L putamen 






4.75 


-24 


15 


3 


L intraparietal sulcus 


0.017 


59 


4.79 


-42 


-42 


36 


R IFG 


0.005 


39 


5.22 


57 


15 


6 


R insula (anterior] 


<0.001 


267 


8.60 


30 


24 


3 


R insula (posterior] 






5.08 


42 


6 


0 


R globus pallidus 






4.57 


18 


0 


3 


R putamen 






4.02 


18 


12 


-3 



Note: Main effect of isochrony > random timing conditions, collapsed across presence or 
absence of an accompanying auditory tone. |[V| S0 + VA| S0 ] > [Vrand + VArand])- Peak voxel 
locations reported in MNI coordinates. Thresholds: voxel p unc < 0.001 and cluster Pfw E < 0.05. 



for the main effect of timing (isochrony > random; Table 2). 
There was also a trend toward the same positive linear relation 
in left occipital cortex (Fig. 3 c). Participants with a greater 
isochrony-induced improvement in performance displayed 
greater activity enhancement in both visual and auditory sensory 
cortices for the same contrast. To better understand the relation 
between visual task performance and auditory cortex responses, 
we repeated the robust-fit regression analysis with the trials 
separated according to presence or absence of the auditory 
tone ([V BO > [VA ISO > VArand]). The only remaining 

significant positive correlation in auditory cortex was observed 
for the right STG during the audiovisual conditions (slope = 
1.51, step = -1.66, % 5 = 3.0, P= 0.047). 

Interaction between Timing and Presence of an Auditory 
Tone 

Whole brain analysis for the interaction contrast ([VA ISO > 
VArand] > [Viso > Vrand]) did not identify any regions showing 
a significantly greater isochrony enhancement in the audiovi- 
sual than vision-only condition. However, the a priori ROI in 
multisensory left pSTS previously identified to be modulated by 
audiovisual timing in Noesselt et al. (2007; 8-mm sphere 
centered at x = -54, y = -50, z= 8; Fig. 4 b) did show a substantial 
trend toward an interaction effect (F t il5 = 3.6; P= 0.077) that is 
reported for completeness; as well as a significant main effect 
of timing (F l 15 = 6.3; P= 0.024). Post hoc Mests confirmed that 
isochrony (vs. random timing) enhanced BOLD signal in left 
pSTS when visual stimuli were accompanied by an auditory 
tone (t 15 = 3.7, P = 0.002) but not when presented alone 
(t i5 = 0.5, P= 0.619, n.s. ; Fig. 4a). Activity in left pSTS was highest 
during the multisensory isochronous condition than all others 
(VA ISO > V ISO : hi = 2.3, P = 0.035; VA BO > Vrand; t l5 = 2.1, 
P= 0.048). Activity in the ROI in right pSTS also showed a main 
effect of timing (F M5 = 5.8; P= 0.029; ISO > RAND), but there 
was no trend toward an interaction with audiovisual synchrony 
(F 1>15 = 0.2; P = 0.668, as.). This concurs with a left 
lateralisation for this multisensory integration site with 
centrally presented audiovisual stimuli (e.g., Calvert 2001; 
Macaluso et al. 2004). 

Whole brain analysis for the opposite interaction contrast 
([VA ISO < VArand] > [Viso < Vrand]) identified a greater effect 
of random (vs. isochronous) timing in the audiovisual than 
vision-only condition for activity in the right STG (cluster 
Pfwe = 0.030, 25 voxels, peak t l5 = 6.75, p unc < 0.001, x = 63, 
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Figure 2. Brain activity enhanced by isochronous stimulus timing, (a) Isochrony versus random timing enhanced activity in bilateral IFG, insula, putamen, and globus pallidus; [b] in 
left DLPFC; and (c) in left IPS, when collapsed across visual (V) and audiovisual (VA) conditions, (d) Conjunction analysis confirmed overlap (purple shading) between isochrony 
enhancement effects on vision-only (V; blue shading) and audiovisual (VA; red shading) conditions in bilateral insula, (e) Cluster mean beta parameters (±1 s.e.d. for isochrony effect) 
plotted for each condition (light bars = isochronous; dark bars = random). Thresholds: voxel p unc < 0.001 and cluster p FWE < 0.05 displayed on mean anatomical brain images. 



Table 2 

Task-related peak voxel location and robust regression with task performance for isochrony effect 



Sensory cortices 


Voxel position in MM coordinates (mm) 




Regression results 








X 


y 


z 


Slope 


Step 


tl5 


P-value 


L occipital lobe 


-14.8 ± 7.8 


-93.8 ± 5.0 


5.0 ± 10.2 


1.27 


-0.27 


1.5 


0.083 


R occipital lobe 


16.8 ± 7.3 


-91.1 ± 4.9 


3.6 ± 12.8 


2.20 


-0.65 


2.5 


0.012* 


L superior 


-55.5 ± 7.7 


-18.9 ± 8.8 


4.3 ± 4.5 


1.89 


-1.08 


2.3 


0.019* 


temporal gyrus 
















R superior 


60.9 ± 5.1 


-17.1 ± 7.9 


5.4 ± 4.4 


1.62 


-1.17 


1.9 


0.041* 


temporal gyrus 

















Note: Group mean (± standard deviation) task-related peak voxel locations in left and right sensory cortices reported in MNI coordinates. Robust-fit regression analysis results reported for participant- 
by-participant positive linear relation between change in beta parameter estimates from these peak voxels and change in visual target detection sensitivity (a" ), for the contrast isochronous > random timing. 
* = significant regression (P < 0.05). 



y= -15, z= O; Fig. 4d). Post hoc paired Mests (using extracted 
mean cluster beta parameter estimates) confirmed this was 
driven by significant enhancement of activity during presenta- 
tions with random than isochronous timing when visual stimuli 
were accompanied by a synchronous auditory tone (£i 5 = 5.3, 
P < O.OOl) but not when they were presented alone (t l5 = -0.5, 
P= 0.651, n.s.; Fig. 4 c). This presumably represents an auditory 
response to unpredictably timed sounds. 

Discussion 

This study investigated the influence of temporal structure 
(isochronous vs. random) for a visual stimulus train on visual 
intensity-target detection and brain activity; and any multisen- 
sory impact of adding sounds temporally coincident with each 
visual event. Highly regular isochronous timing enhanced visual 
target detection sensitivity and speeded detection responses, 
when compared with random timing. Temporal predictability 
also increased BOLD signals in an extended network that is 



involved in temporal processing, including bilateral IFG, insula 
and putamen, and left IPS and DLPFC. There was a positive 
correlation between the participant-by-participant behavioral 
isochrony effect for target detection and the corresponding 
isochrony effect on activity in visual and auditory cortices 
involved by the task. It is noteworthy that "auditory" cortex, as 
well as visual, correlated with the impact of regular timing on 
"visual" performance, when concurrent sounds were present. 
Moreover, a multisensory ROI in the left pSTS showed highest 
activation during the isochronous than random timing specif- 
ically for the audiovisual condition. 

The behavioral finding of enhanced visual target detection 
and speeding of reaction times in the current study, for 
isochronous versus random conditions, is in general accord 
with other studies showing that temporal predictability can aid 
visual task performance (Bertelson 1967; Bertelson and 
Tisseyre 1968; Niemi and Naatanen 1981; Coull and Nobre 
1998; Coull et aL 2000; Griffin et al. 2001; Nobre 2001; Correa 
et al. 2004, 2005; Martens and Johnson 2005; Davranche et al. 
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Figure 3. Positive linear relation between changes in performance and beta parameter estimates in sensory cortices for isochrony. Task-related peak voxels were identified in 
bilateral auditory (superior temporal gyrus, STG) and visual (occipital lobe) cortices per participant. Isochrony-induced voxel beta parameter change in (a) left and (f>) right STG, or 
(c) left and (d) right occipital lobe are plotted against change in target detection sensitivity ((/') for the same isochrony versus random timing contrast ([VAiso + Visol > [VA RA nd 
+ VrandD- O ne data-point plotted per participant (n = 16) with the dashed line representing the robust-fit linear regression result. Individual peak sensory task-related voxel MNI 
coordinates (x- and y-axis) are plotted in the central figure, collapsed in thez-axis, superimposed on a mean anatomical scan (atz = 3) for illustrative purposes. Please note that 
these sensory voxels were selected a priori, before examining the behavioral results; see main text. 
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Figure 4. Auditory tone modulates impact of visual stimulus timing on pSTS ROI. (a) 
Multisensory ROI in left pSTS showed greater isochrony enhancement when visual 
stimuli were accompanied by a synchronous auditory tone (VA), but not when 
presented alone (V). (£>) The 8-mm sphere ROI was centered atx = -54, y = -50, 
i = 8, a location previously identified to be modulated by temporal properties of long 
audiovisual stimulus steams (Noesselt et al. 2007; see also Marchant et al. 201 1 ). By 
contrast, (c and d) whole brain analysis revealed a region in the right superior 
temporal gyrus (STG) that showed the opposite interaction pattern, with greater 
enhancement for random timing during the audiovisual (VA) than the vision-only (V) 
condition, (a and c) Group mean (±1 s.e.d. for isochrony effect) beta parameter 
values plotted for each condition (light bar = isochronous; dark bars = random). \b) 
Shows ROI; (d) shows significant cluster from whole-brain analysis. Both are 
displayed on the mean anatomical image. * = Significant post hoc paired f-test 
[P < 0.05). 

2011); but our study differs in several key respects. Here, 
performance was improved when visual targets were embed- 
ded within a highly regular (thus temporally predictable) 
extended isochronous stimulus train, compared with trains 



with random timing. While several other studies have used 
a preceding sequence of events to build up a temporal 
expectation for the onset of target (Jones et al. 2002; Doherty 
et al. 2005; Rimmele et al. 201 1), they did so for the final event 
in a predictable sequence, thereby defining which item was the 
target. In contrast, here we embedded target events at random 
positions within the sequence of an extended stimulus train 
while varying the temporal structure of that train. Unlike Doherty 
et al. (2005), we were able to show that temporal predictability 
can improve sensitivity to visual target events. Another key 
feature of our paradigm was that the property that defined visual 
targets (i.e., intensity) was fully orthogonal to the timing 
manipulation. Thus, the regular timing in the isochronous 
streams gave no information about which items were targets 
and which were nontargets, yet the regular temporal structure 
nevertheless still improved visual performance objectively. 

Turning to brain activations for the isochronous versus 
pseudorandomly timed streams, parietal cortex, and a wider 
corticostriatal network were preferentially activated during the 
isochronous case. Parietal cortex has often been implicated in 
temporal orienting and temporal judgments (Coull and Nobre 
1998, Coull et al. 2001; Assmus et al. 2003, 2005; Wiener et al. 
2010; Cotti et al. 2011; Davranche et al. 2011) but is not 
commonly reported during (primarily auditory) beat percep- 
tion studies. The enhanced response observed here in the IPS 
presumably reflects the highly predictable nature of visual 
event onset within the isochronous stimulus train. The left 
lateralisation of this IPS response would be in keeping with the 
present task involving only implicit temporal demands (for 
reviews, see Coull and Nobre 2008; Wiener et al. 2010), since 
our participants were never directed to concentrate on 
temporal structure of the stimuli but instead performed an 
orthogonal intensity-target detection task. 

Activity in bilateral putamen, IFG, and insula was also 
enhanced during isochronous versus random stimulus timings, 
when collapsed across presence of an accompanying auditory 
tone. This fits with previously reported preferential responses 
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for these 3 regions during rhythmic stimuli containing strong 
beats, compared with complex or random timing for auditory 
sequences (Grahn and Brett 2007; Bengtsson et al. 2009; Grahn 
and Rowe 2009; Teki et al. 2011); but we now show this 
extends to a visual task. Moreover, we demonstrate that the 
same network is activated even when the role of timing is only 
implicit to the task performed, unlike the majority of previous 
beat-processing studies (Grahn and Brett 2007; Grahn and 
Rowe 2009; Teki et al. 2011). 

The importance of the basal ganglia, which include the 
putamen and globus pallidus regions implicated here, in 
detection of temporal structure is indicated by reduced temporal 
perceptual performance by Parkinsonian patients (Artieda et al. 
1992; Pastor et al. 1992; Rammsayer and Classen 1997; 
Harrington et al. 1998; Malapani et al. 1998; Grahn and Brett 
2009; Wojtecki et al. 2011). Moreover, previous exposure to an 
auditory beat sequence has been shown to enhance activity in 
bilateral putamen during subsequent visual beat perception tasks 
(Grahn et al. 2011). So this structure may provide one common 
site where timing information from different senses may be 
combined (Buhusi and Meek 2005; Meek 2006). 

The impact of isochrony on BOLD signal was also observed 
bilaterally in IFG and left DLPFC. These regions have often been 
recruited during timing tasks (e.g., Rao et al. 2001; Macar et al. 
2002; Lewis and Miall 2003, 2006). It has been proposed that 
the right inferior prefrontal cortex is involved in general time 
measurement (Lewis and Miall 2006) or plays a monitoring role 
during temporal expectation (Vallesi et al. 2007); whereas the 
left frontal operculum has more specifically been implicated in 
temporal sequence discrimination (Schubotz et al. 2000; 
Schubotz and von Cramon 2001) and beat perception strength 
(Grahn and McAuley 2009). Bengtsson et al. (2009) reported 
increasing activation of both left IFG and DLPFC, as well as the 
insula, for stimuli with increasing temporal predictability 
during passive listening to auditory sequences. 

The insula was the only region significantly enhanced by 
isochrony during both visual-only and audiovisual presentations 
here, as identified using a conjunction analysis. This would fit 
with previous studies reporting recruitment of the insula 
during temporal judgment tasks for both auditory (Ferrandez 
et al. 2003; Livesey et al. 2007; Morillon et al. 2009; Herdener 
et al. 2009) and visual stimuli (Rao et al. 2001; Nenadic et al. 
2003; Herdener et al. 2009; see review Kosillo and Smith 2010), 
and in perception of rhythm for extended stimulus trains 
(Schubotz et al. 2000). Although observed bilaterally for the 
insula, the impact of isochrony here was somewhat stronger for 
the right insula, which is preferentially responsive to simple 
compared with complex auditory sequences (Grahn and Brett 
2007). Here, we show this is also the case for visual stimuli, 
irrespective of whether they were presented alone or 
accompanied by a synchronous tone and when timing was 
implicit to the task performed (i.e., visual intensity-target 
detection). 

There was no such main effect of isochrony on BOLD signal 
in sensory cortices but rather the strength of isochrony- 
induced enhancements in visual (occipital lobe) and auditory 
cortices (STG) correlated positively with the improvement in 
visual task performance (rf") for isochronous streams. The 
intriguing correlation of auditory cortex with the impact of 
isochrony on the visual task was found only in the presence of 
concurrent sounds. We propose that these effects reflect 
sensory encoding of the regular temporal properties of the 



isochronous streams, which went on to enhance performance 
in the (orthogonal) visual detection task. The involvement of 
auditory cortex when concurrent sounds were presented 
presumably indicates that the temporal structure of events in 
this additional (but task-irrelevant) modality was also encoded, 
even though the task-relevant target could only arise within the 
visual modality. 

A ROI analysis in left pSTS (site taken from Noesselt et al. 
2007) revealed a trend interaction, with preferential activation 
for stimuli with isochronous rather than random timing only 
when the inputs were multisensory (i.e., auditory tones 
present). Multisensory pSTS has long been implicated as an 
audiovisual integration site (e.g., see Calvert 2001; Beauchamp 
et al. 2004; Bischoff et al. 2007; Hein et al. 2007; Meienbrock 
et al. 2007; Stevenson and James 2009; Stevenson et al. 2010; 
Werner and Noppeney 2010; James et al. 2011). This region is 
thought to receive input from both sensory cortices (Seltzer et al. 
1996; Lewis and van Essen 2000) and functional connectivity 
with these regions can be modulated by correspondence 
between multisensory inputs (Lewis and Noppeney 2010). The 
specific ROI location used here is influenced by the relative 
timing between auditory and visual stimulus trains (Noesselt et al. 
2007; used by Marchant et al. 2011). 

In the current study, synchronous central audiovisual pre- 
sentation enhanced activity in the left pSTS region compared 
with unisensory presentation and this was more pronounced 
for stimuli with predictable than unpredictable timing. 
Furthermore, the impact of temporal predictability on this 
region was restricted to audiovisual presentation, not unisen- 
sory visual stimuli. Left lateralisation of the influence of 
temporal structure on pSTS would be in keeping with other 
audiovisual timing studies using centrally presented stimuli 
(Calvert et al. 2001; Macaluso et al. 2004). This might 
potentially reflect a similar impact of implicit timing on the 
left hemisphere, as observed for inferior parietal cortex (Coull 
and Nobre 1998; Wiener et al. 2010), except specifically 
constrained to multisensory stimulation. 

One other region showed an impact of timing restricted to 
audiovisual presentations but for the reverse contrast. Random 
versus isochronous timing enhanced activity in the right STG 
but only when the visual stimuli were accompanied by 
synchronous tones. Teki et al. (2011) also reported heightened 
response to random compared with isochronous auditory 
stimuli in STG bilaterally but more posterior (x= 66, y = -39, z= 
3) to our peak locus (x = 63, y = -15, z = 3). In a location more 
similar to that observed here (x = 66, y = -22, z = 2), Overath 
et al. (2007) reported increased activity in the planum 
temporale of the right STG that correlated with increasing 
entropy (decreasing predictability) for sequences of tones with 
different pitches. These results together with our current 
findings indicate enhanced BOLD signal in auditory cortex for 
conditions with more auditory disorder (i.e., higher unpredict- 
ability), apparently irrespective of whether this is defined in the 
temporal domain as used here or in the pitch domain for 
Overath et al. (2007). The planum temporale in the STG has 
been proposed as a computational hub for spectrotemporally 
complex auditory information (Griffiths and Warren 2002). 

To conclude, isochronous (vs. random) temporal structure 
for stimulus trains enhanced detection of embedded unisen- 
sory visual intensity targets and increased activity in a cortico- 
striatal network and the IPS. A positive relation was observed 
between isochronous versus random behavioral effects on 
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visual target detection and activity in sensory cortices. Two 
regions showed an impact of timing limited to audiovisual 
presentations: predictable timing enhanced activity in multi- 
sensory pSTS, while random timing enhanced activity in the 
planum temporale. We believe this is the first evidence that the 
influence of temporal encoding in multisensory integration is 
not only restricted to the relative timing between inputs from 
different modalities, but it is also dependent upon the predict- 
able nature of component events within each sensory modality. 
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